129 research outputs found

    Improving the energy efficiency of autonomous underwater vehicles by learning to model disturbances

    Get PDF
    Energy efficiency is one of the main challenges for long-term autonomy of AUVs (Autonomous Underwater Vehicles). We propose a novel approach for improving the energy efficiency of AUV controllers based on the ability to learn which external disturbances can safely be ignored. The proposed learning approach uses adaptive oscillators that are able to learn online the frequency, amplitude and phase of zero-mean periodic external disturbances. Such disturbances occur naturally in open water due to waves, currents, and gravity, but also can be caused by the dynamics and hydrodynamics of the AUV itself. We formulate the theoretical basis of the approach, and demonstrate its abilities on a number of input signals. Further experimental evaluation is conducted using a dynamic model of the Girona 500 AUV in simulation on two important underwater scenarios: hovering and trajectory tracking. The proposed approach shows significant energy-saving capabilities while at the same time maintaining high controller gains. The approach is generic and applicable not only for AUV control, but also for other type of control where periodic disturbances exist and could be accounted for by the controller. © 2013 IEEE

    Encoderless position control of a two-link robot manipulator

    Get PDF

    Simultaneous discovery of multiple alternative optimal policies by reinforcement learning

    No full text
    Conventional reinforcement learning algorithms for direct policy search are limited to finding only a single optimal policy. This is caused by their local-search nature, which allows them to converge only to a single local optimum in policy space, and makes them heavily dependent on the policy initialization. In this paper, we propose a novel reinforcement learning algorithm for direct policy search, which is capable of simultaneously finding multiple alternative optimal policies. The algorithm is based on particle filtering and performs global search in policy space, therefore eliminating the dependency on the policy initialization, and having the ability to find the globally optimal policy. We validate the approach on one-and two-dimensional problems with multiple optima, and compare its performance to a global random sampling method, and a state-of-the-art Expectation-Maximization based reinforcement learning algorithm. © 2012 IEEE

    Towards improved AUV control through learning of periodic signals

    No full text
    Designing a high-performance controller for an Autonomous Underwater Vehicle (AUV) is a challenging task. There are often numerous requirements, sometimes contradicting, such as speed, precision, robustness, and energy-efficiency. In this paper, we propose a theoretical concept for improving the performance of AUV controllers based on the ability to learn periodic signals. The proposed learning approach is based on adaptive oscillators that are able to learn online the frequency, amplitude and phase of zero-mean periodic signals. Such signals occur naturally in open water due to waves, currents, and gravity, but can also be caused by the dynamics and hydrodynamics of the AUV itself. We formulate the theoretical basis of the approach, and demonstrate its abilities on synthetic input signals. Further evaluation is conducted in simulation with a dynamic model of the Girona 500 AUV on a hovering task

    Direct policy search reinforcement learning based on particle filtering

    No full text
    We reveal a link between particle filtering methods and direct policy search reinforcement learning, and propose a novel reinforcement learning algorithm, based heavily on ideas borrowed from particle filters. A major advantage of the proposed algorithm is its ability to perform global search in policy space and thus find the globally optimal policy. We validate the approach on one- and two-dimensional problems with multiple optima, and compare its performance to a global random sampling method, and a state-of-the-art ExpectationMaximization based reinforcement learning algorithm

    Combining Local and Global Direct Derivative-free Optimization for Reinforcement Learning

    No full text
    We consider the problem of optimization in policy space for reinforcement learning. While a plethora of methods have been applied to this problem, only a narrow category of them proved feasible in robotics. We consider the peculiar characteristics of reinforcement learning in robotics, and devise a combination of two algorithms from the literature of derivative-free optimization. The proposed combination is well suited for robotics, as it involves both off-line learning in simulation and on-line learning in the real environment. We demonstrate our approach on a real-world task, where an Autonomous Underwater Vehicle has to survey a target area under potentially unknown environment conditions. We start from a given controller, which can perform the task under foreseeable conditions, and make it adaptive to the actual environment

    Kinematic-free position control of a 2-DOF planar robot arm

    No full text
    This paper challenges the well-established as- sumption in robotics that in order to control a robot it is necessary to know its kinematic information, that is, the arrangement of links and joints, the link dimensions and the joint positions. We propose a kinematic-free robot control concept that does not require any prior kinematic knowledge. The concept is based on our hypothesis that it is possible to control a robot without explicitly measuring its joint angles, by measuring instead the effects of the actuation on its end-effector. We implement a proof-of-concept encoderless robot con- troller and apply it for the position control of a physical 2- DOF planar robot arm. The prototype controller is able to successfully control the robot to reach a reference position, as well as to track a continuous reference trajectory. Notably, we demonstrate how this novel controller can cope with something that traditional control approaches fail to do: adapt to drastic kinematic changes such as 100% elongation of a link, 35-degree angular offset of a joint, and even a complete overhaul of the kinematics involving the addition of new joints and links

    INFRAWEBS Axiom Editor - A graphical ontology-driven tool for creating complex logical expressions

    Get PDF
    The current INFRAWEBS European research project aims at developing ICT framework enabling software and service providers to generate and establish open and extensible development platforms for Web Service applications. One of the concrete project objectives is developing a full-life-cycle software toolset for creating and maintaining Semantic Web Services (SWSs) supporting specific applications based on Web Service Modelling Ontology (WSMO) framework. According to WSMO, functional and behavioural descriptions of a SWS may be represented by means of complex logical expressions (axioms). The paper describes a specialized user-friendly tool for constructing and editing such axioms - INFRAWEBS Axiom Editor. After discussing the main design principles of the Editor, its functional architecture is briefly presented. The tool is implemented in Eclipse Graphical Environment Framework and Eclipse Rich Client Platform

    Robot-object contact perception using symbolic temporal pattern learning

    No full text
    This paper investigates application of machine learning to the problem of contact perception between a robots gripper and an object. The input data comprises a multidimensional time-series produced by a force/torque sensor at the robots wrist, the robots proprioceptive information, namely, the position of the end-effector, as well as the robots control command. These data are used to train a hidden Markov model (HMM) classifier. The output of the classifier is a prediction of the contact state, which includes no contact, a contact aligned with the central axis of the valve, and an edge contact. To distinguish between contact states, the robot performs exploratory behaviors that produce distinct patterns in the time-series data. The patterns are discovered by first analyzing the data using a probabilistic clustering algorithm that transforms the multidimensional data into a one-dimensional sequence of symbols. The symbols produced by the clustering algorithm are used to train the HMM classifier. We examined two exploratory behaviors: a rotation around the x-axis, and a rotation around the y-axis of the gripper. We show that using these two exploratory behaviors we can successfully predict a contact state with an accuracy of 88 ± 5 % and 81 ± 10 %, respectively

    Probability redistribution using time hopping for reinforcement learning

    No full text
    —A method for using the Time Hopping technique as a tool for probability redistribution is proposed. Applied to reinforcement learning in a simulation, it is able to re-shape the state probability distribution of the underlying Markov decision process as desired. This is achieved by modifying the target selection strategy of Time Hopping appropriately. Experiments with a robot maze reinforcement learning problem show that the method improves the exploration efficiency by re-shaping the state probability distribution to an almost uniform distribution
    • …
    corecore